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Chapter 1 — More Data, Getter 
Tools, More Opportunity 


This chapter looks at the rise of data and discusses how both 
data and data infrastructure have changed over the years. The 
chapter also provides a look at some examples that show how 
data integration has been used to solve problems in several 
different industries. 


Chapter 2 — Data Integration 101 


This chapter introduces you to some common data integra- 
tion terminology and offers a basic understanding of how data 
integration works. 


Chapter 3 — Understanding Data 
Integration Challenges 


This chapter examines the challenges that organizations face 
in implementing data integration projects. 


Chapter 4 — Understanding the 
Benefits of Data Integration 
Tools 


This chapter shows you how modern data integration tools 
address the challenges and provide benefits such as increased 
efficiency, better decision making, and faster development of 
the solutions you need. 


Chapter 5 — Top Ten Things to Look 
For in a Data Integration Tool 


Finally, this chapter provides some tips on what to look for in 
your data integration solution. 
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Chapter 1: More Data, Better Tools, More Opportunity g 


Examining Some Data 
Integration Examples 


Data integration may sound interesting on its own, but there’s 
nothing like a good real-world example to show its true value. 
This section gives you a look at how several types of organiza- 
tions can use data integration to improve their services and 
bottom line along with a use case for each. 


Healthcare 


By integrating legacy data with social media data, health 
organizations can make better predictions about the spread 
of contagious diseases — and thus make more informed deci- 
sions to protect public health. Even a day or two saved in 
getting vaccines to the right location or implementing a quar- 
antine can make a huge difference in limiting the extent of a 
health crisis. 


Very recently, researchers at a renowned East-Coast university 
experimented with a new approach aimed at providing more 
immediate information about the current status of flu infections. 
Rather than relying on traditional data sources, the researchers 
used advanced algorithms to look at unstructured data from 
Twitter feeds. The researchers analyzed hundreds of thousands 
of tweets to determine locations where people had the flu. This 
social media data provided an up-to-the-minute and predictive 
look at where the flu was spreading at a particular point in time. 
Without integrating the new types of data from social media 
with the existing legacy data, it simply wouldn’t be possible to 
gain this quick insight. 


In the past, the data that’s been available for tracking flu 
outbreaks has been traditional, legacy type data that only 
tells you what happened after the fact — and after the data 
has been collected and analyzed. In other words, you can 
see what has happened in various parts of the country, but 
because the information is at least several weeks old, it’s not 
very useful for predicting where you'll need to send addi- 
tional supplies of flu vaccine or where you'll need to ramp 
up staffing levels at clinics and hospitals in order to stem the 
epidemic. 
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Financial 


Financial institutions are also finding that they need the power 
of data integration to better compete in today’s market. They 
need to understand who their customers are and how to 
deliver services that fit their specific needs. 


A large bank wanted to offer better, more customer-oriented 
services that required it to rapidly access and integrate exist- 
ing customer, product, and activity data across multiple busi- 
ness applications and legacy transactional systems. The bank 
also wanted to be able to find and fix data quality problems, 
such as incorrect customer address data, duplicates, misspell- 
ings, and inconsistent values. 


To meet these goals, the bank embarked on an enterprise-wide 
data integration and data quality strategy to reorient around 
the customer. By integrating data from different disconnected 
sources, the bank was able to understand not only what each 
customer valued but also the value the customer represented 
to the bank’s bottom line. The result of using data integration 
was better business intelligence and customer knowledge to 
segment customer audiences, tailor business streams, deliver 
value to customers, and target the customers that would 
deliver the bank more revenue. 


Retail/Bb26 


Retail organizations need to adapt to an ever more competitive 
marketplace. Quite simply, customers have more options, so 
stores need to be able to provide better service to remain 
competitive. And they need their systems to be organized and 
up to date so internal teams aren’t working at odds or hunting 
for information that should be easy to find. 


A large business product sales group faced the challenge of 
creating a unified sales order management system. The sales 
professionals responsible for business solutions relied on a 
CRM system in the cloud for their selling strategy. However, 
all the other crucial sales data, including prospects, sales 
orders, and devices deployed out in the field resided in the 
company’s on-premise enterprise resource planning (ERP) 
platform. 
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In any discussion of data integration, you’re likely to hear 
numerous mentions of big data (see Chapter 2 for more infor- 
mation on that). In reality, what many would call big data is 
no more of a challenge than any other data. Sure, there’s more 
volume, but the right data integration tools make handling 
that larger volume just about as easy as handling any data. 
Once everyone gets used to the fact that data went big and is 
going to stay big, the terminology will likely change from big 
data to data. 


More important than whether or not the data is big data is 
the question of why you want to move or aggregate your data. 
There can be many answers to this question, such as a desire 
to improve marketing efforts or the need to comply with new 
regulations, but fundamentally, data integration consolidates 
data from a number of different sources and formats in order 
to produce useful business information that better illustrates 
the bigger picture. 


The technical challenges to data integration are often lumped 
into the three Vs: variety, volume, and velocity. A number 

of people talk about veracity as a fourth V. A few people are 
starting to talk about value as the fifth V. All five are important 
and should be vital to your planning process. The following 
sections discuss the specifics of these five categories. 


Variety 


The first technical challenge to data integration is the variety 
of data. Today, many kinds of data exist, ranging from highly 
structured data found in legacy mainframe databases, to semi- 
structured data that follows various industry standards such 
as XML, to relatively unstructured data such as web content 
and social media comments. (This is discussed in more detail 
in Chapter 2.) 


A relatively new type of data that you need to consider is 
machine sensor data. For instance, the smart meters that utility 
companies use to remotely measure energy consumption collect 
sensor data. Or the hundreds of sensors incorporated into 
modern automobiles that monitor how your car is running as 
well as logging data immediately before an accident. 
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__ Chapter 4: Understanding the Benefits of Data Integration Tools 39 


something like a jet engine has as many as 3,000 sensors. All 
these sensors generate huge amounts of data. In fact, a typical 
modern airliner can log anywhere from 100 gigabytes up toa 
half a terabyte on a flight. This is an awful lot of new data to 
process. 


Unfortunately, a lot of that new data isn’t very useful. For 
example, it’s estimated that 98 percent of sensor data alerts 
recorded for that airline flight represents false positives. That 
means there was an alert to a potential engine problem that 
wasn't actually a problem. You need to have a system that 
can filter out the extraneous data so that you can focus on 
the meaningful data. Otherwise, there will be too much time 
spent checking engine problems that aren’t actually problems 
and the cost of maintenance would go up, and as a result, so 
would the cost of an airline ticket! It’s pretty clear that having 
someone go through half a terabyte of data manually for each 
airline flight wouldn’t be reasonable, economical, or very effi- 
cient. Rather, you need an automated system that can quickly 
analyze the data and ignore the garbage. This is where a data 
integration system comes into play. Manual processes simply 
can’t scale fast enough to meet these changing needs. 


In addition to machine sensor technology, data integration 
tools can help you scale up as your needs embrace future 
technologies. For example, many organizations are moving to 
cloud-based applications and storage. Good data integration 
tools support this type of move in a transparent and seamless 
manner. Essentially, your business analysts and IT developers 
can use the same data integration tools for in-house projects, 
for projects deployed using Hadoop, and for cloud-based 
projects. 


Another very important consideration is that with the right 
tools, you can grow from small projects to enterprise-level 
projects without having to move to something new. For exam- 
ple, a good tool would be appropriate for entry-level data 
integration projects, those smaller discrete projects, but have 
the capability to move up and scale to support projects that 
grow and become business critical. This consistency across 

a broad range of capabilities means you don’t need to relearn 
new tools as you grow. Rather, you can leverage the knowl- 
edge you've gained without having to start from scratch. 
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A big factor in getting data integration tools that can scale to 
suit your future needs is remembering that the pace of tech- 
nological innovation doesn’t show any sign of slowing anytime 
soon. Consider that, for example, virtually everything has 
some connection to the Internet today, but just 20 years ago 
the Internet was primarily a private playground for college 
students and government researchers. Back then, even Bill 
Gates missed how important the Internet would become. Ten 
years ago, if you mentioned the cloud, pretty much everyone 
would have assumed that you were talking about the weather. 
Today, the Internet and the cloud have both become integral 
parts of everyday life. Who knows what exciting new technol- 
ogy is just around the corner? Fortunately, good data integra- 
tion tools will enable you to take advantage of the next big 
thing — and innovate and stay ahead of your competition — 
without going back to square one. 
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development and execution options, as well as whatever 
comes along next. Such support is vital in protecting your 
investment. Things can change; new technologies come along 
and can dominate quickly. So you want to make sure your tool 
is flexible and future proof. 


Data Profiling 


With so many different sources of data involved, you need to 
have a means to make sure that your data is what you expect. 
It’s important that your tools allow a level of data profiling so 
that you can verify the data going into and out of your system, 
and ensure that you'll end up with the desired results. 


Data Quality 


Finally, you need to remember that poor data quality can sink 
any project. It’s absolutely essential that your data integration 
tools enable you to embed data quality into the data integra- 
tion process. After all, you know what to expect at the output 
if garbage was the source! 
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e Data integration 101 — why there’s so 
much data today, what your business can 
do with it, and how data integration helps 
you use it 


¢ Data integration challenges — the issues 
you face when trying to combine data from 
different sources 





e Data integration benefits — how the 
right data integration tools can help you 
easily consolidate data sources and give 
your business the agility it needs to be 
competitive 
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